Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity
نویسندگان
چکیده
Semantic Textual Similarity (STS) is a foundational NLP task and can be used in a wide range of tasks. To determine the STS of two texts, hundreds of different STS systems exist, however, for an NLP system designer, it is hard to decide which system is the best one. To answer this question, an intrinsic evaluation of the STS systems is conducted by comparing the output of the system to human judgments on semantic similarity. The comparison is usually done using Pearson correlation. In this work, we show that relying on intrinsic evaluations with Pearson correlation can be misleading. In three common STS based tasks we could observe that the Pearson correlation was especially ill-suited to detect the best STS system for the task and other evaluation measures were much better suited. In this work we define how the validity of an intrinsic evaluation can be assessed and compare different intrinsic evaluation methods. Understanding of the properties of the targeted task is crucial and we propose a framework for conducting the intrinsic evaluation which takes the properties of the targeted task into account.
منابع مشابه
Robust semantic text similarity using LSA, machine learning, and linguistic resources
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from se...
متن کاملLearning the Impact of Machine Translation Evaluation Metrics for Semantic Textual Similarity
We present a work to evaluate the hypothesis that automatic evaluation metrics developed for Machine Translation (MT) systems have significant impact on predicting semantic similarity scores in Semantic Textual Similarity (STS) task for English, in light of their usage for paraphrase identification. We show that different metrics may have different behaviors and significance along the semantic ...
متن کاملAnálise de Medidas de Similaridade Semântica na Tarefa de Reconhecimento de Implicação Textual (Analysis of Semantic Similarity Measures in the Recognition of Textual Entailment Task)[In Portuguese]
In this work, we present a feature-based approach to the RTE (Recognizing Text Entailment) task that verifies the similarity between two sentences including syntactic and semantic aspects. The selected features come from the winning work of the RTE task of the workshop ASSIN (Semantic Similarity Evaluation and Textual Inference) with some changes and addition of other semantic feature. The eval...
متن کاملUoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment
This paper presents the system submitted by University of Wolverhampton for SemEval-2014 task 1. We proposed a machine learning approach which is based on features extracted using Typed Dependencies, Paraphrasing, Machine Translation evaluation metrics, Quality Estimation metrics and Corpus Pattern Analysis. Our system performed satisfactorily and obtained 0.711 Pearson correlation for the sema...
متن کاملIntrinsic and extrinsic approaches to recognizing textual entailment
Recognizing Textual Entailment (RTE) is to detect an important relation between two texts, namely whether one text can be inferred from the other. For natural language processing, especially for natural language understanding, this is a useful and challenging task. We start with an introduction of the notion of textual entailment, and then define the scope of the recognition task. We summarize ...
متن کامل